Skip to content

Adds support for Cholla's new Concatenation Format#5170

Open
mabruzzo wants to merge 22 commits into
yt-project:mainfrom
mabruzzo:cholla-frontend-newformat
Open

Adds support for Cholla's new Concatenation Format#5170
mabruzzo wants to merge 22 commits into
yt-project:mainfrom
mabruzzo:cholla-frontend-newformat

Conversation

@mabruzzo
Copy link
Copy Markdown
Contributor

@mabruzzo mabruzzo commented May 21, 2025

PR Summary

This PR adds support for loading Cholla datasets that are:

  • distributed over multiple files
  • stored in Cholla's historical concatenation datasets

For context:

  • when Cholla writes out snapshots, it writes each block to a separate file.
  • The Cholla repository provide python scripts to concatenate data, which has historically concatenate all of the data so that the resulting file would appear like there was only 1 process that evolved a single gigantic block. These scripts were commonly used to simplify data analysis.
  • Until this PR, the yt-frontend has only known how to load datasets after they were concatenated in this manner. This introduced challenges because:
    1. the yt-frontend must load everything at once (introducing obvious memory issues)
    2. in cases where the number of cells exceeds 2^31 yt would simply break (this came up on large-memory nodes)

This PR introduces support for reading Cholla datasets that are either:

  • distributed (i.e. they are not concatenated), OR
  • use Cholla's newly added concatenation format.
    • I introduced the new concatenation format with the intent of improving performance when using yt (Introduce new organizational schema format for storing Cholla data cholla-hydro/cholla#427). Essentially the new format organizes data in a manner based on the underlying block structure (in a similar manner to what an AMR simulation might do). This let's us reuse a lot of the machinery for reading distributed datasets
    • It's important that the yt-frontend can read in this format because (i) it provides some performance advantages (on parallel filesystems) and (ii) we have scripts to repackage datasets using the old concatenation format so that they now have this new format.

Aside: This PR superseded PR #4702

PR Checklist

  • New features are documented, with docstrings and narrative docs

@neutrinoceros neutrinoceros added enhancement Making something better code frontends Things related to specific frontends frontend: cholla labels Jun 26, 2025
Comment thread yt/frontends/cholla/data_structures.py Outdated
Comment thread yt/frontends/cholla/data_structures.py Outdated
Comment thread yt/frontends/cholla/data_structures.py Outdated
Comment thread yt/frontends/cholla/data_structures.py Outdated
Co-authored-by: Clément Robert <cr52@protonmail.com>
Comment thread yt/frontends/cholla/io.py Outdated
Comment thread yt/frontends/cholla/io.py Outdated
mabruzzo and others added 3 commits June 27, 2025 08:31
Co-authored-by: Clément Robert <cr52@protonmail.com>
@mabruzzo
Copy link
Copy Markdown
Contributor Author

I lightly refactored a little code

I realized that _split_fname_procid_suffix was no longer used for
anything. I think it's nice to have a function that explicitly describes
the relationship between the filename passed into yt.load and the
filename template. Consequently I replaced this function with a new
function called `_infer_fname_template` that we call from
`_determine_data_layout` (the contents didn't change dramatically)
@mabruzzo mabruzzo force-pushed the cholla-frontend-newformat branch from d54d19b to bbf529f Compare August 26, 2025 17:10
@mabruzzo
Copy link
Copy Markdown
Contributor Author

@neutrinoceros, @matthewturk -- I just wanted to ping you to see if there is anything I can do to make this PR easier to review. I would love to see this get merged in time for the 4.5 release (whenever that is)

-> this testing strategy creates synthetic data and performs a
   round-trip test to confirm that the data is loaded properly
This is in anticipation of a followup PR adding support for particles
@matthewturk
Copy link
Copy Markdown
Member

@mabruzzo I am so sorry for not seeing and responding sooner.

I have reviewed this and it looks really good. I'm OK with either doing another pass or putting it in. Thank you for your hard work on it.

@mabruzzo
Copy link
Copy Markdown
Contributor Author

@matthewturk: Not a problem. Thanks for looking it over.

I just pushed 4 commits today because I had started to work on a follow-up PR (that adds Particle support) and I was trying to minimize how much the followup PR would change things. The 3 new commits both:

  • reorganize some of the code and factor out an idiom I was using multiple times (related to opening and closing hdf5 files)
  • added some new unit tests to actually test a bunch of functionality (it generates synthetic datasets in different formats and then tries to load them)

Unfortunately, these tests aren't actually passing on CI (Frankly, I don't know why -- they work when I run pytest yt/frontends/cholla/tests/test_load.py on my local machine...) If you can spot an obvious mistake that I'm making, I'm happy to try to fix it. Otherwise, I'm happy to move these 4 commits to a separate PR

@mabruzzo
Copy link
Copy Markdown
Contributor Author

Ok, I think the problem is that my new test relies upon pytest fixtures (and other machinery) and I need to somehow exclude the test from the nose test-runner

@mabruzzo
Copy link
Copy Markdown
Contributor Author

Is there an easy way to exclude the test from the nose test runner (but not the pytest runner?)

@neutrinoceros
Copy link
Copy Markdown
Member

Yes. Add it to nose_ignores.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

code frontends Things related to specific frontends enhancement Making something better frontend: cholla

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants